RcometsAnalytics supports all cohort-specific analyses of the COMETS
consortium. This collaborative work is done via the COMETS harmonization
group activities. For more information, see the COMETS website.
This vignette demonstrates using the RcometsAnalytics R package from the
command line, while
the tutorial
demonstrates using RcometsAnalytics from the GUI. Documentation of the
RcometsAnalytics R package can be found here manual.
Each project could create their own vignette to run the analyses.
The required input file should be in excel format with the following 6 sheets:
Complete documentation of the various sheets can be found in the package documentation. An example input file is available HERE.
Only empty cells in any excel sheet become missing values when the R software reads the sheet into a data frame. Any non-numeric value in a cell for a continuous variable will result in an error.
The first step of the analysis is to load in the data with the
readCOMETSinput() function.
Input for this function is an Excel spreadsheet, per the description
above.
# Retrieve the full path of the input data
dir <- system.file("extdata", package="RcometsAnalytics", mustWork=TRUE)
csvfile <- file.path(dir, "cohort_1.xlsx")
# Read in and process the input data
exmetabdata <- RcometsAnalytics::readCOMETSinput(csvfile)
## VarMap sheet is read in.
## Metabolites sheet is read in.
## SubjectMetabolites sheet is read in.
## SubjectData sheet is read in.
## Models sheet is read in.
## Model_Types sheet is read in.
## There are 16 categorical variables.
## Running Integrity Check...
## Joining with `by = join_by(hmdb_id)`
## Begin testing models in Models sheet...
## Filtering subjects according to the rule(s) age< 70. 361 of 432 are retained.
## Warning in runModel.addRemVars(rem.obj, vars[oneVal], vars.type, "too few
## unique non-missing values", : The variable(s) female, fasted have been removed
## from adjvars because of: too few unique non-missing values
## Warning in runModel.addRemVars(rem.obj, tmp[rem], varSet, "linearly dependent",
## : The variable(s) multivitamin.2 have been removed from adjvars because of:
## linearly dependent
## Finished testing models in Models sheet.
To plot some the distribution of variances for each metabolite:
RcometsAnalytics::plotVar(exmetabdata,titlesize=12)
## Warning: The titlefont attribute is deprecated. Use title = list(font = ...)
## instead.
To plot the distribution of minimum/missing values:
RcometsAnalytics::plotMinvalues(exmetabdata,titlesize=12)
## Warning: The titlefont attribute is deprecated. Use title = list(font = ...)
## instead.
There are 2 ways to specify your model, batch or interactive. In Batch mode, models are specified in your input file Models sheet. The model information needs to be read in with the function getModelData() and processed so the software knows which models to run. The following call defines the “1 Age” model from the Models sheet in the input file to be run.
exmodeldata <- RcometsAnalytics::getModelData(exmetabdata,modlabel="1 Age")
In Interactive mode, models are specified as parameters. The model
information needs to be read in with the function
getModelData() and processed so the software knows which models
to run.
The following call defines the model with age and bmi_grp as the
exposure variables, and includes only the subjects with age < 70 and
horm_curr < 2. This is a manual way to specify the “5 Age
Multivariable adjusted stratified subset” model.
exmodeldata2 <- RcometsAnalytics::getModelData(exmetabdata, modelspec="Interactive",
exposures=c("female", "smk_grp", "bmi_grp", "race_grp",
"educ_grp", "alc_grp", "multivitamin", "fasted"),
where=c("age<70","horm_curr<2"))
## Filtering subjects according to the rule(s) age<70 AND horm_curr<2. 240 of 432 are retained.
The runModel() function is the main function for running a single model, and by default, a correlation analysis is performed. The string “DPP” is a label for the cohort which will appear under the “cohort” column in the output.
excorrdata <- RcometsAnalytics::runModel(exmodeldata,exmetabdata,"DPP")
The output of the correlation analysis can then be compiled and output to an Excel file with the following function:
RcometsAnalytics::OutputListToExcel(filename="DPP_corr.xlsx", excorrdata)
To view the first 3 lines of the correlation analysis output, simply type:
RcometsAnalytics::showModel(excorrdata,nlines=3)
##
## ModelSummary:
## run outcomespec exposurespec term nobs message adjvars
## 1 1 tyrosine age 432
## 2 2 umbelliferone_sulfate age 432
## 3 3 undecanedioate age 432
## adjvars.removed adjspec outcome_uid
## 1 HMDB00158#HMDB00158_B1#HMDB00158_B2
## 2 CHEM100006282
## 3 HMDB00888
## outcome exposure_uid exposure adj_uid
## 1 L-Tyrosine#L-Tyrosine#L-Tyrosine age Age at Entry
## 2 Umbelliferone sulfate or X - 18241 age Age at Entry
## 3 Undecanedioic acid age Age at Entry
## metabolite_name
## 1 tyrosine
## 2 umbelliferone sulfate
## 3 undecanedioate
##
## Effects:
## run outcomespec exposurespec term estimate pvalue
## 1 1 tyrosine age age -0.055482497 0.2498444
## 2 2 umbelliferone_sulfate age age 0.077841119 0.1061674
## 3 3 undecanedioate age age -0.002051102 0.9660938
## metabolite_name
## 1 tyrosine
## 2 umbelliferone sulfate
## 3 undecanedioate
##
## Errors_Warnings:
## [1] type object message
## <0 rows> (or 0-length row.names)
##
## Table1:
## variable in.model type n n.unique min quartile1 median mean
## 1 age exposure continuous 432 20 55 59 62 63.09259
## quartile3 max n.missing
## 1 67 74 0
##
## Info:
## name value
## 1 date 2023-10-20 10:22:36
## 2 cohort DPP
## 3 RcometsAnalytics version 2.9.0.9
## NULL
To display the heatmap of the resulting correlation matrix, use the showheatmap function.
RcometsAnalytics::showHeatmap(excorrdata)
To display the hierarchical clustering of the resulting correlation
matrix, use the showHClust function. This diplay requires at least 2
rows and 2 columns in the correlation matrix.
exmodeldata<-RcometsAnalytics::getModelData(exmetabdata,modelspec = "Interactive",exposures = c("age", "bmi_grp"))
excorrdata <- RcometsAnalytics::runModel(exmodeldata,exmetabdata,"DPP")
RcometsAnalytics::showHClust(excorrdata, showticklabels=FALSE)
A stratified correlation analysis can be performed by specifiying stratification variables in the call to getModelData(). If more than one stratification variable is specified, then the strata will be defined by all unique combinations of the stratification variables. The following call will define a model stratified by bmi_grp. You can add multivariable adjustments as “adjvars”. This corresponds to “4 Age Multivariable adjusted stratified” in the example sheet.
exmodeldata2 <- RcometsAnalytics::getModelData(exmetabdata,modelspec="Interactive",
outcomes=c("tyrosine","uracil"),
exposures=c("age"),
adjvars=c("smk_grp", "race_grp", "educ_grp", "alc_grp",
"horm_curr"),
strvars="bmi_grp")
The stratified, fully adjusted correlation analysis is run by calling the runModel() function.
excorrdata2 <- RcometsAnalytics::runModel(exmodeldata2,exmetabdata,"DPP")
Correlation analysis can be performed in a subset by specifiying subset variables in the call to getModelData(). The following call will define a model restricted to individuals under 70 years of age. This corresponds to “5 Age Multivariable adjusted stratified subset” in the example sheet.
exmodeldata2 <- RcometsAnalytics::getModelData(exmetabdata,modelspec="Interactive",
outcomes=c("tyrosine","uracil"),
exposures=c("age"),
adjvars=c("female", "smk_grp", "bmi_grp", "race_grp", "educ_grp",
"alc_grp", "multivitamin", "fasted"),
strvars="horm_curr",
where = "age<70")
## Filtering subjects according to the rule(s) age<70. 361 of 432 are retained.
The stratified, fully adjusted correlation analysis is run by calling the runModel() function.
excorrdata2 <- RcometsAnalytics::runModel(exmodeldata2,exmetabdata,"DPP")
## Warning in runModel.addRemVars(rem.obj, vars[oneVal], vars.type, "too few
## unique non-missing values", : The variable(s) female, fasted have been removed
## from adjvars because of: too few unique non-missing values
## Warning in runModel.addRemVars(rem.obj, tmp[rem], varSet, "linearly dependent",
## : The variable(s) multivitamin.2 have been removed from adjvars because of:
## linearly dependent
## Warning in runModel.addRemVars(rem.obj, vars[oneVal], vars.type, "too few
## unique non-missing values", : The variable(s) female, fasted have been removed
## from adjvars because of: too few unique non-missing values
## Warning in runModel.addRemVars(rem.obj, tmp[rem], varSet, "linearly dependent",
## : The variable(s) multivitamin.2 have been removed from adjvars because of:
## linearly dependent
## Warning in runModel.addRemVars(rem.obj, vars[oneVal], vars.type, "too few
## unique non-missing values", : The variable(s) female, fasted have been removed
## from adjvars because of: too few unique non-missing values
## Warning in runModel.addRemVars(rem.obj, tmp[rem], varSet, "linearly dependent",
## : The variable(s) multivitamin.2 have been removed from adjvars because of:
## linearly dependent
Call getModelData() to define the “7 Poisson regression” model, has n_visits as the outcome variable, and has tyrosine and uracil as the exposure variables. The variable n_visits must be a non-negative integer valued variable.
exmodeldata <- RcometsAnalytics::getModelData(exmetabdata,modelspec="Interactive", adjvars="age",
outcomes="n_visits", exposures=c("tyrosine","uracil"))
To run a Poisson regression, the list of options op must also include a model.options list with family set to “poisson”.
op <- list(model="glm", model.options=list(family="poisson"))
poisson_results <- RcometsAnalytics::runModel(exmodeldata, exmetabdata, "DPP", op=op)
print(poisson_results)
## $ModelSummary
## run outcomespec exposurespec converged term wald.pvalue nobs message
## 1 1 n_visits tyrosine 1 tyrosine 0.7435891 432
## 2 2 n_visits uracil 1 uracil 0.5321560 432
## adjvars adjvars.removed exposure.covariances adjspec outcome_uid outcome
## 1 age age n_visits n_visits
## 2 age age n_visits n_visits
## exposure_uid exposure adj_uid
## 1 HMDB00158#HMDB00158_B1#HMDB00158_B2 L-Tyrosine#L-Tyrosine#L-Tyrosine age
## 2 HMDB00300 Uracil age
## metabolite_name
## 1 tyrosine
## 2 uracil
##
## $Effects
## run outcomespec exposurespec term estimate std.error statistic
## 1 1 n_visits tyrosine tyrosine 0.04930922 0.15074471 0.3271041
## 2 2 n_visits uracil uracil 0.01390407 0.02225654 0.6247183
## pvalue estimate.lower estimate.upper metabolite_name
## 1 0.7435891 -0.24614499 0.34476343 tyrosine
## 2 0.5321560 -0.02971794 0.05752607 uracil
##
## $Errors_Warnings
## [1] type object message
## <0 rows> (or 0-length row.names)
##
## $Table1
## variable in.model type n n.unique min quartile1 median mean
## 1 n_visits outcome continuous 432 6 0 1 2 2.050926
## 2 age adjustment continuous 432 20 55 59 62 63.092593
## quartile3 max n.missing
## 1 3 5 0
## 2 67 74 0
##
## $Info
## name
## 1 date
## 2 cohort
## 3 RcometsAnalytics version
## 4 R version
## 5 operating system
## 6 input file
## 7 model name
## 8 run type
## 9 op$check.cor.method
## 10 op$check.cor.cutoff
## 11 op$check.nsubjects
## 12 op$max.nstrata
## 13 op$model
## 14 op$output.Effects
## 15 op$output.ModelSummary
## 16 op$output.ci_alpha
## 17 op$output.metab.cols
## 18 op$output.type
## 19 op$output.merge
## 20 op$chemEnrich
## 21 op$chemEnrich.adjPvalue
## 22 op$max.npairwise
## 23 model.options$family
## 24 model.options$link
## 25 outcome
## 26 exposure
## 27 strata
## 28 exposurerefs
## value
## 1 2023-10-20 10:22:41
## 2 DPP
## 3 2.9.0.9
## 4 R version 4.1.2 (2021-11-01)
## 5 Windows 10 x64 (build 19045)
## 6 C:/Users/nicol/OneDrive/Documents/R/win-library/4.1/RcometsAnalytics/extdata/cohort_1.xlsx
## 7
## 8 Interactive
## 9 spearman
## 10 0.97
## 11 25
## 12 10
## 13 glm
## 14 exposure
## 15 anova
## 16 0.95
## 17 metabolite_name
## 18 xlsx
## 19 none
## 20 0
## 21 0.05
## 22 1000
## 23 poisson
## 24 log
## 25 n_visits
## 26 *
## 27
## 28
##
## attr(,"ptime")
## [1] "Processing time: 0.41 sec"
## attr(,"class")
## [1] "runModel"
Call getModelData() to define a model which adjusts for age, has nested_case as the outcome variable, and has tyrosine and uracil as the exposure variables. The variable nested_case must be a binary 0-1 variable.
exmodeldata <- RcometsAnalytics::getModelData(exmetabdata,modelspec="Interactive", adjvars="age",
outcomes="nested_case", exposures=c("tyrosine","uracil"))
To run a logistic regression, the list of options op must also include a model.options list with family set to “binomial”.
op <- list(model="glm", model.options=list(family="binomial"))
glm_results <- RcometsAnalytics::runModel(exmodeldata, exmetabdata, "DPP", op=op)
print(glm_results)
## $ModelSummary
## run outcomespec exposurespec converged term wald.pvalue nobs message
## 1 1 nested_case tyrosine 1 tyrosine 0.6986625 432
## 2 2 nested_case uracil 1 uracil 0.4131957 432
## adjvars adjvars.removed exposure.covariances adjspec outcome_uid outcome
## 1 age age nested_case nested_case
## 2 age age nested_case nested_case
## exposure_uid exposure adj_uid
## 1 HMDB00158#HMDB00158_B1#HMDB00158_B2 L-Tyrosine#L-Tyrosine#L-Tyrosine age
## 2 HMDB00300 Uracil age
## metabolite_name
## 1 tyrosine
## 2 uracil
##
## $Effects
## run outcomespec exposurespec term estimate std.error statistic
## 1 1 nested_case tyrosine tyrosine 0.16789291 0.433690 0.3871266
## 2 2 nested_case uracil uracil 0.05244784 0.064095 0.8182829
## pvalue exp.estimate exp.std.error estimate.lower estimate.upper
## 1 0.6986625 1.182810 0.51297282 -0.68212383 1.0179097
## 2 0.4131957 1.053848 0.06754636 -0.07317605 0.1780717
## exp.estimate.lower exp.estimate.upper metabolite_name
## 1 0.5055422 2.767404 tyrosine
## 2 0.9294372 1.194911 uracil
##
## $Errors_Warnings
## [1] type object message
## <0 rows> (or 0-length row.names)
##
## $Table1
## variable in.model type n n.outcomeEqual0 n.outcomeEqual1
## 1 nested_case outcome continuous 432 221 211
## 2 age adjustment continuous 432 221 211
## n.unique min quartile1 median mean quartile3 max n.missing
## 1 2 0 0 0 0.4884259 1 1 0
## 2 20 55 59 62 63.0925926 67 74 0
##
## $Info
## name
## 1 date
## 2 cohort
## 3 RcometsAnalytics version
## 4 R version
## 5 operating system
## 6 input file
## 7 model name
## 8 run type
## 9 op$check.cor.method
## 10 op$check.cor.cutoff
## 11 op$check.nsubjects
## 12 op$max.nstrata
## 13 op$model
## 14 op$output.Effects
## 15 op$output.ModelSummary
## 16 op$output.ci_alpha
## 17 op$output.metab.cols
## 18 op$output.type
## 19 op$output.merge
## 20 op$chemEnrich
## 21 op$chemEnrich.adjPvalue
## 22 op$max.npairwise
## 23 model.options$family
## 24 model.options$link
## 25 outcome
## 26 exposure
## 27 strata
## 28 exposurerefs
## value
## 1 2023-10-20 10:22:41
## 2 DPP
## 3 2.9.0.9
## 4 R version 4.1.2 (2021-11-01)
## 5 Windows 10 x64 (build 19045)
## 6 C:/Users/nicol/OneDrive/Documents/R/win-library/4.1/RcometsAnalytics/extdata/cohort_1.xlsx
## 7
## 8 Interactive
## 9 spearman
## 10 0.97
## 11 25
## 12 10
## 13 glm
## 14 exposure
## 15 anova
## 16 0.95
## 17 metabolite_name
## 18 xlsx
## 19 none
## 20 0
## 21 0.05
## 22 1000
## 23 binomial
## 24 logit
## 25 nested_case
## 26 *
## 27
## 28
##
## attr(,"ptime")
## [1] "Processing time: 0.4 sec"
## attr(,"class")
## [1] "runModel"
Call getModelData() to define a model which adjusts for age, has event as the outcome variable, time as the time-to-event variable, and has tyrosine and uracil as the exposure variables. The variable event must be binary, coded as 0 for non-events and 1 for events. The variable time must be positive.
exmodeldata <- RcometsAnalytics::getModelData(exmetabdata,modelspec="Interactive", adjvars="age",
outcomes="event", timevar="time", exposures=c("tyrosine","uracil"))
To run a survival model, the list of options op must also include a model.options list with model set to “coxph”.
op <- list(model="coxph")
coxph_results <- RcometsAnalytics::runModel(exmodeldata, exmetabdata, "DPP", op=op)
print(coxph_results$ModelSummary)
## run outcomespec exposurespec term wald.pvalue nobs message adjvars
## 1 1 event tyrosine tyrosine 0.2699664 432 age
## 2 2 event uracil uracil 0.1078152 432 age
## adjvars.removed exposure.covariances adjspec outcome_uid outcome
## 1 age event event
## 2 age event event
## exposure_uid exposure adj_uid
## 1 HMDB00158#HMDB00158_B1#HMDB00158_B2 L-Tyrosine#L-Tyrosine#L-Tyrosine age
## 2 HMDB00300 Uracil age
## metabolite_name
## 1 tyrosine
## 2 uracil
Call getModelData() to define a model which adjusts for age, has nested_case as the outcome variable, matchedSet as the group variable, and has tyrosine and uracil as the exposure variables. The variable nested_case must be binary, coded as 0 for controls and 1 for cases. The variable matchedSet defines the matched sets of groups in the data.
exmodeldata <- RcometsAnalytics::getModelData(exmetabdata,modelspec="Interactive", adjvars="age",
outcomes="nested_case", groupvar="matchedSet", exposures=c("tyrosine","uracil"))
To run a survival model, the list of options op must also include a model.options list with model set to “clogit”.
op <- list(model="clogit")
clogit_results <- RcometsAnalytics::runModel(exmodeldata, exmetabdata, "DPP", op=op)
print(clogit_results$ModelSummary)
## run outcomespec exposurespec term wald.pvalue nobs message adjvars
## 1 1 nested_case tyrosine tyrosine 0.5293571 432 age
## 2 2 nested_case uracil uracil 0.3065695 432 age
## adjvars.removed exposure.covariances adjspec outcome_uid outcome
## 1 age nested_case nested_case
## 2 age nested_case nested_case
## exposure_uid exposure adj_uid
## 1 HMDB00158#HMDB00158_B1#HMDB00158_B2 L-Tyrosine#L-Tyrosine#L-Tyrosine age
## 2 HMDB00300 Uracil age
## metabolite_name
## 1 tyrosine
## 2 uracil
All models desginated in the input file can be run with one command, and individual output Excel files or correlation results will be written in the current directory by default. The function returns a list of objects.
allresults <- RcometsAnalytics::runAllModels(exmetabdata,writeTofile=TRUE)
Try running a model using the variables in the VarMap and your own outcomes/predictors in the Interactive mode of RcometsAnalytics!
sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] nlme_3.1-153 lubridate_1.9.2
## [3] webshot_0.5.5 RColorBrewer_1.1-3
## [5] httr_1.4.6 tools_4.1.2
## [7] backports_1.4.1 bslib_0.5.0
## [9] ISwR_2.0-8 utf8_1.2.3
## [11] R6_2.5.1 rpart_4.1-15
## [13] lazyeval_0.2.2 colorspace_2.1-0
## [15] nnet_7.3-16 withr_2.5.1
## [17] tidyselect_1.2.0 gridExtra_2.3
## [19] mnormt_2.1.1 compiler_4.1.2
## [21] cli_3.6.1 TSP_1.2-4
## [23] plotly_4.10.2 labeling_0.4.3
## [25] sass_0.4.6 scales_1.2.1
## [27] psych_2.3.6 stringr_1.5.0
## [29] digest_0.6.33 rmarkdown_2.23
## [31] ca_0.71.1 pkgconfig_2.0.3
## [33] htmltools_0.5.5 parallelly_1.36.0
## [35] fastmap_1.1.1 htmlwidgets_1.6.2
## [37] rlang_1.1.1 readxl_1.4.3
## [39] rstudioapi_0.15.0 farver_2.1.1
## [41] jquerylib_0.1.4 generics_0.1.3
## [43] jsonlite_1.8.7 crosstalk_1.2.0
## [45] dendextend_1.17.1 dplyr_1.1.2
## [47] ModelMetrics_1.2.2.2 magrittr_2.0.3
## [49] Matrix_1.5-4.1 Rcpp_1.0.11
## [51] munsell_0.5.0 fansi_1.0.4
## [53] viridis_0.6.3 lifecycle_1.0.3
## [55] stringi_1.7.12 pROC_1.18.4
## [57] yaml_2.3.7 MASS_7.3-54
## [59] plyr_1.8.8 recipes_1.0.6
## [61] grid_4.1.2 parallel_4.1.2
## [63] listenv_0.9.0 ppcor_1.1
## [65] lattice_0.20-45 splines_4.1.2
## [67] knitr_1.43 pillar_1.9.0
## [69] corpcor_1.6.10 future.apply_1.11.0
## [71] reshape2_1.4.4 codetools_0.2-18
## [73] stats4_4.1.2 glue_1.6.2
## [75] evaluate_0.21 data.table_1.14.8
## [77] vctrs_0.6.3 foreach_1.5.2
## [79] cellranger_1.1.0 gtable_0.3.4
## [81] purrr_1.0.1 tidyr_1.3.0
## [83] future_1.33.0 heatmaply_1.4.2
## [85] assertthat_0.2.1 cachem_1.0.8
## [87] ggplot2_3.4.2 xfun_0.39
## [89] gower_1.0.1 prodlim_2023.03.31
## [91] RcometsAnalytics_2.9.0.12 broom_1.0.5
## [93] class_7.3-19 survival_3.2-13
## [95] viridisLite_0.4.2 timeDate_4022.108
## [97] seriation_1.4.2 tibble_3.2.1
## [99] iterators_1.0.14 registry_0.5-1
## [101] hardhat_1.3.0 lava_1.7.2.1
## [103] timechange_0.2.0 globals_0.16.2
## [105] ellipsis_0.3.2 caret_6.0-94
## [107] subselect_0.15.4 ipred_0.9-14